Enhance experiment runner with deterministic controls by buzypi · Pull Request #193 · karpathy/autoresearch

buzypi · 2026-03-11T20:37:16Z

Summary

This PR adds a execution workflow for autonomous experiments, replacing session-by-session program.md interpretation with a deterministic runner plus agent runbook.

What’s included

Add workflows/run_experiment.py as the single experiment orchestrator:
- start, resume, status commands
- top-level stage controls: setup, baseline, loop
- loop sub-stage controls: propose, apply, commit, train, triage, record, decide
- resumable checkpointing under workflows/runs/<run_id>/
- run-id policy: <branch-slug>-rNNN
Add AGENTS.md runbook with explicit natural-language to command mapping for agent sessions.
Setup robustness:
- auto-run uv run prepare.py when cache/tokenizer is missing (default on, opt-out via --no-auto-prepare)
- explicit setup precondition checks before baseline/loop
Background training support:
- training stages start in background by default (--background-train)
- resume polls/continues in-flight baseline/train jobs
Human intervention support:
- proposal override via run-scoped workflows/runs/<run_id>/next_proposal.json
- proposal override via explicit --proposal-file <path> on start/resume
- canonical override schema at workflows/schemas/proposal.schema.json
- deterministic precedence in propose stage: --proposal-file -> next_proposal.json -> stochastic proposal -> deterministic fallback
- consumed run-scoped proposals are archived to workflows/runs/<run_id>/consumed_proposals/iter_<NNNN>.json

Why

In long-running autonomous sessions, prose-only execution is fragile and inconsistent. Multiple complaints regarding this on X.
This PR makes runs repeatable, resumable, inspectable, and human-steerable while preserving program.md as the policy/objective layer.

buzypi · 2026-03-11T20:44:22Z

Basic End-to-End Flow (via OpenCode)

Start OpenCode in repo root:

opencode

Ask it to start a run:

Type this in OpenCode: "Start running the experiment, run 5 loops"

It maps to:

python workflows/run_experiment.py start --loops 5

Continue later:

Type this in OpenCode: "Run another 5 iterations"

It maps to:

python workflows/run_experiment.py resume --loops 5

Check status:

The training runs happen in the background. And the agent goes into a sleep (which it auto-determines). We can interrupt it and ask it questions like: "Show the run status".

It maps to:

python workflows/run_experiment.py status

We can ask any other questions like: "Tell me about the results so far" or "What is the GPU usage" etc. We can ask it to resume its work.

Human-in-the-Loop Override

If you want to inject your own experiment idea instead of stochastic proposal generation:

Type this in OpenCode: "In the next iteration, increase the LR by 10% and keep warmup unchanged. Use this as a human proposal and run 1 loop."

Useful Stage-Control Examples (Direct)

In case you want to go headless you can do these:

Setup + baseline only:

python workflows/run_experiment.py start --only setup,baseline

Run only selected loop internals for 3 iterations:

python workflows/run_experiment.py resume --loops 3 --only loop --loop-only train,record,decide

Foreground mode (disable background training):

python workflows/run_experiment.py resume --loops 1 --no-background-train

Logs and Artifacts

Run outputs are written to:

workflows/runs/<run_id>/runner.log (human-readable timeline)
workflows/runs/<run_id>/history.jsonl (structured events)
workflows/runs/<run_id>/state.json (checkpoint state)

add resumable workflow runner and agent runbook

9e981e9

buzypi mentioned this pull request Mar 11, 2026

Enhance experiment runner with deterministic controls #148

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance experiment runner with deterministic controls#193

Enhance experiment runner with deterministic controls#193
buzypi wants to merge 1 commit intokarpathy:masterfrom
buzypi:pr/workflow-runner-only

buzypi commented Mar 11, 2026

Uh oh!

buzypi commented Mar 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

buzypi commented Mar 11, 2026

Summary

What’s included

Why

Uh oh!

buzypi commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Basic End-to-End Flow (via OpenCode)

Human-in-the-Loop Override

Useful Stage-Control Examples (Direct)

Logs and Artifacts

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

buzypi commented Mar 11, 2026 •

edited

Loading